automatic segmentation
The Impact of Prosodic Segmentation on Speech Synthesis of Spontaneous Speech
Galdino, Julio Cesar, Leal, Sidney Evaldo, De Souza, Leticia Gabriella, Lima, Rodrigo de Freitas, Moreira, Antonio Nelson Fornari Mendes, Junior, Arnaldo Candido, Oliveira, Miguel Jr., Casanova, Edresson, Aluísio, Sandra M.
Spontaneous speech presents several challenges for speech synthesis, particularly in capturing the natural flow of conversation, including turn-taking, pauses, and disfluencies. Although speech synthesis systems have made significant progress in generating natural and intelligible speech, primarily through architectures that implicitly model prosodic features such as pitch, intensity, and duration, the construction of datasets with explicit prosodic segmentation and their impact on spontaneous speech synthesis remains largely unexplored. This paper evaluates the effects of manual and automatic prosodic segmentation annotations in Brazilian Portuguese on the quality of speech synthesized by a non-autoregressive model, FastSpeech 2. Experimental results show that training with prosodic segmentation produced slightly more intelligible and acoustically natural speech. While automatic segmentation tends to create more regular segments, manual prosodic segmentation introduces greater variability, which contributes to more natural prosody. Analysis of neutral declarative utterances showed that both training approaches reproduced the expected nuclear accent pattern, but the prosodic model aligned more closely with natural pre-nuclear contours. To support reproducibility and future research, all datasets, source codes, and trained models are publicly available under the CC BY-NC-ND 4.0 license.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
Mateusz Malinowski, Mario Fritz
We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multiworld approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them. The system is directly trained from question-answer pairs. We establish a first benchmark for this task that can be seen as a modern attempt at a visual turing test.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Asia > Middle East > Jordan (0.04)
Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet
Wang, Litingyu, Liao, Wenjun, Zhang, Shichuan, Wang, Guotai
Head and neck tumors and metastatic lymph nodes are crucial for treatment planning and prognostic analysis. Accurate segmentation and quantitative analysis of these structures require pixel-level annotation, making automated segmentation techniques essential for the diagnosis and treatment of head and neck cancer. In this study, we investigated the effects of multiple strategies on the segmentation of pre-radiotherapy (pre-RT) and mid-radiotherapy (mid-RT) images. For the segmentation of pre-RT images, we utilized: 1) a fully supervised learning approach, and 2) the same approach enhanced with pre-trained weights and the MixUp data augmentation technique. For mid-RT images, we introduced a novel computational-friendly network architecture that features separate encoders for mid-RT images and registered pre-RT images with their labels. The mid-RT encoder branch integrates information from pre-RT images and labels progressively during the forward propagation. We selected the highest-performing model from each fold and used their predictions to create an ensemble average for inference. In the final test, our models achieved a segmentation performance of 82.38% for pre-RT and 72.53% for mid-RT on aggregated Dice Similarity Coefficient (DSC) as HiLab. Our code is available at https://github.com/WltyBY/HNTS-MRG2024_train_code.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
Enhanced segmentation of femoral bone metastasis in CT scans of patients using synthetic data generation with 3D diffusion models
Saillard, Emile, Levillain, Aurélie, Mitton, David, Pialat, Jean-Baptiste, Confavreux, Cyrille, Follet, Hélène, Grenier, Thomas
Purpose: Bone metastasis have a major impact on the quality of life of patients and they are diverse in terms of size and location, making their segmentation complex. Manual segmentation is time-consuming, and expert segmentations are subject to operator variability, which makes obtaining accurate and reproducible segmentations of bone metastasis on CT-scans a challenging yet important task to achieve. Materials and Methods: Deep learning methods tackle segmentation tasks efficiently but require large datasets along with expert manual segmentations to generalize on new images. We propose an automated data synthesis pipeline using 3D Denoising Diffusion Probabilistic Models (DDPM) to enchance the segmentation of femoral metastasis from CT-scan volumes of patients. We used 29 existing lesions along with 26 healthy femurs to create new realistic synthetic metastatic images, and trained a DDPM to improve the diversity and realism of the simulated volumes. We also investigated the operator variability on manual segmentation. Results: We created 5675 new volumes, then trained 3D U-Net segmentation models on real and synthetic data to compare segmentation performance, and we evaluated the performance of the models depending on the amount of synthetic data used in training. Conclusion: Our results showed that segmentation models trained with synthetic data outperformed those trained on real volumes only, and that those models perform especially well when considering operator variability.
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Bone Cancer (0.83)
Automatic identification of the area covered by acorn trees in the dehesa (pastureland) Extremadura of Spain
Benjamin, Ojeda-Magaña, Ruben, Ruelas, Joel, Quintanilla-Dominguez, Leopoldo, Gomez-Barba, Juan, Lopez de Herrera, Jose, Robledo-Hernandez, Ana, Tarquis
The acorn is the fruit of the oak and is an important crop in the Spanish dehesa extreme\~na, especially for the value it provides in the Iberian pig food to obtain the "acorn" certification. For this reason, we want to maximise the production of Iberian pigs with the appropriate weight. Hence the need to know the area covered by the crowns of the acorn trees, to determine the covered wooded area (CWA, from the Spanish Superficie Arbolada Cubierta SAC) and thereby estimate the number of Iberian pigs that can be released per hectare, as indicated by the royal decree 4/2014. In this work, we propose the automatic estimation of the CWA, through aerial digital images (orthophotos) of the pastureland of Extremadura, and with this, to offer the possibility of determining the number of Iberian pigs to be released in a specific plot of land. Among the main issues for automatic detection are, first, the correct identification of acorn trees, secondly, correctly discriminating the shades of the acorn trees and, finally, detect the arbuscles (young acorn trees not yet productive, or shrubs that are not oaks). These difficulties represent a real challenge, both for the automatic segmentation process and for manual segmentation. In this work, the proposed method for automatic segmentation is based on the clustering algorithm proposed by Gustafson-Kessel (GK) but the modified version of Babuska (GK-B) and on the use of real orthophotos. The obtained results are promising both in their comparison with the real images and when compared with the images segmented by hand. The whole set of orthophotos used in this work correspond to an approximate area of 142 hectares, and the results are of great interest to producers of certified "acorn" pork.
- North America > Mexico (0.05)
- Europe > Spain > Galicia > Madrid (0.05)
- North America > United States > New York (0.04)
- (4 more...)
MAMA-MIA: A Large-Scale Multi-Center Breast Cancer DCE-MRI Benchmark Dataset with Expert Segmentations
Garrucho, Lidia, Reidel, Claire-Anne, Kushibar, Kaisar, Joshi, Smriti, Osuala, Richard, Tsirikoglou, Apostolia, Bobowicz, Maciej, del Riego, Javier, Catanese, Alessandro, Gwoździewicz, Katarzyna, Cosaka, Maria-Laura, Abo-Elhoda, Pasant M., Tantawy, Sara W., Sakrana, Shorouq S., Shawky-Abdelfatah, Norhan O., Abdo-Salem, Amr Muhammad, Kozana, Androniki, Divjak, Eugen, Ivanac, Gordana, Nikiforaki, Katerina, Klontzas, Michail E., García-Dosdá, Rosa, Gulsun-Akpinar, Meltem, Lafcı, Oğuz, Mann, Ritse, Martín-Isla, Carlos, Prior, Fred, Marias, Kostas, Starmans, Martijn P. A., Strand, Fredrik, Díaz, Oliver, Igual, Laura, Lekadir, Karim
Current research in breast cancer Magnetic Resonance Imaging (MRI), especially with Artificial Intelligence (AI), faces challenges due to the lack of expert segmentations. To address this, we introduce the MAMA-MIA dataset, comprising 1506 multi-center dynamic contrast-enhanced MRI cases with expert segmentations of primary tumors and non-mass enhancement areas. These cases were sourced from four publicly available collections in The Cancer Imaging Archive (TCIA). Initially, we trained a deep learning model to automatically segment the cases, generating preliminary segmentations that significantly reduced expert segmentation time. Sixteen experts, averaging 9 years of experience in breast cancer, then corrected these segmentations, resulting in the final expert segmentations. Additionally, two radiologists conducted a visual inspection of the automatic segmentations to support future quality control studies. Alongside the expert segmentations, we provide 49 harmonized demographic and clinical variables and the pretrained weights of the well-known nnUNet architecture trained using the DCE-MRI full-images and expert segmentations. This dataset aims to accelerate the development and benchmarking of deep learning models and foster innovation in breast cancer diagnostics and treatment planning.
- Europe > Austria > Vienna (0.14)
- Europe > Greece (0.05)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- (13 more...)
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
LISBET: a self-supervised Transformer model for the automatic segmentation of social behavior motifs
Chindemi, Giuseppe, Girard, Benoit, Bellone, Camilla
Social behavior, defined as the process by which individuals act and react in response to others, is crucial for the function of societies and holds profound implications for mental health. To fully grasp the intricacies of social behavior and identify potential therapeutic targets for addressing social deficits, it is essential to understand its core principles. Although machine learning algorithms have made it easier to study specific aspects of complex behavior, current methodologies tend to focus primarily on single-animal behavior. In this study, we introduce LISBET (seLf-supervIsed Social BEhavioral Transformer), a model designed to detect and segment social interactions. Our model eliminates the need for feature selection and extensive human annotation by using self-supervised learning to detect and quantify social behaviors from dynamic body parts tracking data. LISBET can be used in hypothesis-driven mode to automate behavior classification using supervised finetuning, and in discovery-driven mode to segment social behavior motifs using unsupervised learning. We found that motifs recognized using the discovery-driven approach not only closely match the human annotations but also correlate with the electrophysiological activity of dopaminergic neurons in the Ventral Tegmental Area (VTA). We hope LISBET will help the community improve our understanding of social behaviors and their neural underpinnings.
- Europe > Switzerland > Geneva > Geneva (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.30)
How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Terpstra, Corbyn, Khebour, Ibrahim, Bradford, Mariah, Wisniewski, Brett, Krishnaswamy, Nikhil, Blanchard, Nathaniel
Collaborative problem solving (CPS) in teams is tightly coupled with the creation of shared meaning between participants in a situated, collaborative task. In this work, we assess the quality of different utterance segmentation techniques as an aid in annotating CPS. We (1) manually transcribe utterances in a dataset of triads collaboratively solving a problem involving dialogue and physical object manipulation, (2) annotate collaborative moves according to these gold-standard transcripts, and then (3) apply these annotations to utterances that have been automatically segmented using toolkits from Google and OpenAI's Whisper. We show that the oracle utterances have minimal correspondence to automatically segmented speech, and that automatically segmented speech using different segmentation methods is also inconsistent. We also show that annotating automatically segmented speech has distinct implications compared with annotating oracle utterances--since most annotation schemes are designed for oracle cases, when annotating automatically-segmented utterances, annotators must invoke other information to make arbitrary judgments which other annotators may not replicate. We conclude with a discussion of how future annotation specs can account for these needs.
DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images
Diaz-Pinto, Andres, Mehta, Pritesh, Alle, Sachidanand, Asad, Muhammad, Brown, Richard, Nath, Vishwesh, Ihsani, Alvin, Antonelli, Michela, Palkovics, Daniel, Pinter, Csaba, Alkalay, Ron, Pieper, Steve, Roth, Holger R., Xu, Daguang, Dogra, Prerna, Vercauteren, Tom, Feng, Andrew, Quraini, Abood, Ourselin, Sebastien, Cardoso, M. Jorge
Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and click-based refinement. DeepEdit combines the power of two methods: a non-interactive (i.e. automatic segmentation using nnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow), into a single deep learning model. It allows easy integration of uncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertainty computation) and active learning. We propose and implement a method for training DeepEdit by using standard training combined with user interaction simulation. Once trained, DeepEdit allows clinicians to quickly segment their datasets by using the algorithm in auto segmentation mode or by providing clicks via a user interface (i.e. 3D Slicer, OHIF). We show the value of DeepEdit through evaluation on the PROSTATEx dataset for prostate/prostatic lesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) dataset for abdominal CT segmentation, using state-of-the-art network architectures as baseline for comparison. DeepEdit could reduce the time and effort annotating 3D medical images compared to DeepGrow alone. Source code is available at https://github.com/Project-MONAI/MONAILabel
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Canary Islands (0.04)
- (2 more...)
Automatic Segmentation of Aircraft Dents in Point Clouds
Lafiosca, Pasquale, Fan, Ip-Shing, Avdelidis, Nicolas P.
Dents on the aircraft skin are frequent and may easily go undetected during airworthiness checks, as their inspection process is tedious and extremely subject to human factors and environmental conditions. Nowadays, 3D scanning technologies are being proposed for more reliable, human-independent measurements, yet the process of inspection and reporting remains laborious and time consuming because data acquisition and validation are still carried out by the engineer. For full automation of dent inspection, the acquired point cloud data must be analysed via a reliable segmentation algorithm, releasing humans from the search and evaluation of damage. This paper reports on two developments towards automated dent inspection. The first is a method to generate a synthetic dataset of dented surfaces to train a fully convolutional neural network. The training of machine learning algorithms needs a substantial volume of dent data, which is not readily available. Dents are thus simulated in random positions and shapes, within criteria and definitions of a Boeing 737 structural repair manual. The noise distribution from the scanning apparatus is then added to reflect the complete process of 3D point acquisition on the training. The second proposition is a surface fitting strategy to convert 3D point clouds to 2.5D. This allows higher resolution point clouds to be processed with a small amount of memory compared with state-of-the-art methods involving 3D sampling approaches. Simulations with available ground truth data show that the proposed technique reaches an intersection-over-union of over 80%. Experiments over dent samples prove an effective detection of dents with a speed of over 500 000 points per second.
- Aerospace & Defense > Aircraft (1.00)
- Transportation > Air (0.87)